-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[MIGraphX EP] Fix dangling graph outputs nodes being promoted to output nodes when graph not split #26224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MIGraphX EP] Fix dangling graph outputs nodes being promoted to output nodes when graph not split #26224
Conversation
cc @tianleiwu @snnn bugfix for MIGraphX EP |
if (output.second->Exists()) { | ||
auto name = output.second->Name(); | ||
if (std::find(graph_output_names.begin(), graph_output_names.end(), name) == graph_output_names.end()) { | ||
// if graph is split we dont know if output is used so we need this, otherwise if the graph isn't split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the graph is split, can't we check if an output is used by checking if it has an external consumer? Wouldn't it be better to build this check into the logic above so that we don't have to add another parameter into the function call and complicate the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically want to run the entire model through MIGraphx instead of adding memcopies between another EP (say CPU EP) since that adds overhead.
Sure the logic can be added later but in the "no partition" case, why bother checking every node? We know the entire graphs input and outputs from the original metadata and then can confidently say those nodes have no use an can be pruned.
For the split case, you can check but again going back to the first piece ( we want to minimize fallback) it would be a signal for us to ensure all the operators in the graph are supported and add the support in MIGraphX for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. This approach reduces the overhead of looping through the edges of every node.
One concern: this approach will promote internal dangling outputs to subgraph outputs in the "partition" case, which will then be computed and allocated when they should have been pruned in this step, leading to a waste of memory and computation later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the flag for the unsupported nodes will be nonzero, and then we revert back to the previous behavior as if the is_graph_split check wasn't set anymore.
The issue isn't just that we have dangling output edges from internal nodes, its that in the full case they're interpreted as a valid output to be promoted to the graph output, modifying the metadata. In this case too there are 0 fusions being done that we saw in the model.
In the split case, yes you need to check if that node output edge is consumed before determine if its dangling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand what you mean.
No, because the flag for the unsupported nodes will be nonzero, and then we revert back to the previous behavior > as if the is_graph_split check wasn't set anymore.
By "previous behavior," you mean the behavior before this PR, which promoted dangling outputs from individual nodes to outputs of the full subgraph?
I assume that is what you mean because you say this:
In the split case, yes you need to check if that node output edge is consumed before determine if its dangling.
But then, given that in this PR we are not checking "if that node output edge is consumed before determin[ing] if it's dangling" that means we have incorrect logic in the split case in this PR.
Specifically, is_graph_split
will evaluate to true and promote internal dangling outputs of nodes in this subgraph to subgraph outputs. Although these dangling outputs won't be promoted to model outputs, they will still be treated as outputs of the subgraph when they should have been pruned.
The issue isn't just that we have dangling output edges from internal nodes, its that in the full case they're >interpreted as a valid output to be promoted to the graph output, modifying the metadata. In this case too there >are 0 fusions being done that we saw in the model.
I agree that there are two issues:
1). Dangling outputs promoted to full model outputs,
2). dangling outputs promoted to fused node outputs
This solves 1, but keeps problem 2 intact.
A better approach would be to solve 1 and 2 with one modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this isn't about "better" its about "good enough" here for now. I've got a customer waiting on a change for a fully parsed model so case 2 isn't valid. If we hit case 2 we typically just let MIGraphX do the optimizations since we turn off all optimizations and let MIGraphX handle it as well or add in the missing parser op support. infact similar to other EPs the intent is that case 1 is the default case and support be added so we don't have to split the graph ever.
@snnn any comments on this? We've put this for ROCm 7.1 builds right now. |
@snnn if this is good enough for ORT, let us know. If you would prefer a refactoring that makes use of |
You can still do the refactor and I can test/take a look. These aren't mutually exclusive and you're welcome to contribute since we're open source. I'd likely test this refactor and mirror the change on the AMD side to our internal branch if it improves things. |
Please merge latest main, and also run lintrunner to format the code. |
Description
Fixes a case where full graph passed to GetSubGraph() promotes intermediate dangling node outputs as the final graph output.
Motivation and Context
Fixes this as was seen in a customer model for us. Refereced to MIGraphX EP change here:ROCm#179